Analysis of Speech Rhythm for Language Identification Based on Beat Histograms

نویسندگان

  • Athanasios Lykartsis
  • Alexander Lerch
  • Stefan Weinzierl
چکیده

Rhythm is a basic property of acoustic signals [1][2], with a presumed common basis for its perception grounded both in speech and music [3], hinting towards a similarity which can be tracked in the acoustic signals as well. For speech signals, rhythm analysis can provide relevant conclusions both with respect to linguistic questions (e.g. language rhythm typology) and for applications in speech technology (e.g. in multilingual dialogue systems). However, speech rhythm is difficult to analyze, since its modeling or measurement are not straightforward. In phonetics, the measurement of speech rhythm has mainly been performed by the development of statistical measures (known as rhythm metrics) that capture the patterns of intervals of and between salient speech elements such as vowels, consonants and syllables. Such metrics include the standard deviation of consonant intervals ∆C, the percentage of vocalic intervals %V and the Pairwise Variability Index (PVI) [4][5][6]. Although they have been used extensively for speech rhythm description and the investigation of rhythmical differences between languages, those measures have also been criticized [7] for lack of robustness and for producing inconsistent results with respect to the rhythm class hypothesis, which states that languages belong either to a stresstimed or to a syllable-timed group [8]. Further problems include the manual or automatic annotation of speech elements which is required in order to perform the analysis, as well as that the focus lies only on high-level language elements (such as syllables or consonants-vowels) and their duration patterns for rhythm description instead of examining directly measurable signal properties. Various technical attempts to model rhythm were also undertaken in the field of rhythm-based language identification (LID). A number of studies ([10][11][12][13]) have extracted rhythmic units by using the concept of automatic segmentation in pseudosyllables (structures of the form CV , where C is a consonant and V a vowel) and calculating parameters concerning duration and properties of speech elements such as fundamental frequency or energy. Such studies have achieved satisfactory results (60 − 80%) in rhythm-based LID for a number of speech corpora, which shows the importance of rhythm and prosody based features for the LID task. They still, however, bear the disadvantage of taking into account higher-level language units such as syllables to extract speech rhythm. In order to overcome these problems, we propose an alternative approach for rhythm extraction and modeling for LID. We draw inspiration from the field of Music Information Retrieval (MIR), where there have been numerous approaches for rhythm extraction, for instance for the problem of automatic musical genre classification. One of the widely used representations is the Beat Histogram, which has emerged as a method for rhythmic content description for audio classification and has been described in [15][16][17]. Its basic premise is that the rhythm of an audio excerpt can be described through creating a representation of the distribution of its periodicities in a very low frequency area and extracting relevant statistical and other properties from it. A similar approach has been recently presented by Tilsen & Arvaniti [9], who modelled speech rhythm by extracting periodicities and from the signal envelope and analyzing their relationships. This paper describes the use of the beat histogram for the creation of speech rhythm features for LID by using several relevant signal properties as the basis for its creation. The goals are the evaluation of those novel features for rhythm-based LID and the analysis of speech rhythm through investigation of the rhythm class hypothesis. In the following, the methods for speech rhythm feature extraction are described. The classification setup with two supervised learning algorithms as well as the experimental results for one multilingual speech corpus are presented and discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using the beat histogram for speech rhythm description and language identification

In this paper we present a novel approach for the description of speech rhythm and the extraction of rhythm-related features for automatic language identification (LID). Previous methods have extracted speech rhythm through the calculation of features based on salient elements of speech such as consonants, vowels and syllables. We present how an automatic rhythm extraction method borrowed from ...

متن کامل

Unsupervised Classification of Music Signals

This thesis describes the ideal properties of an adaptable music classification system based on unsupervised machine learning, and argues that such a system should be based on the fundamental musical properties of timbre, rhythm, melody and harmony. The first two properties and the signal features associated with them are then explored in more depth. In the area of timbre, the relationship betw...

متن کامل

Delayed Referral in Children with Speech and Language Disorders for Rehabilitation Services

Objectives: Speech and language development is one of the main aspects of evolution in humans and is one of the most complex brain functions such that it is referred to as one of the highest cortical functions such as thinking, reading and writing. Speech and language disorders are considered as a major public health problem because they cause many secondary complications in the childhood and a...

متن کامل

Evidence for Multiple Rhythmic Skills

Rhythms, or patterns in time, play a vital role in both speech and music. Proficiency in a number of rhythm skills has been linked to language ability, suggesting that certain rhythmic processes in music and language rely on overlapping resources. However, a lack of understanding about how rhythm skills relate to each other has impeded progress in understanding how language relies on rhythm pro...

متن کامل

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015